Entropy Guided Transformation Learning
نویسندگان
چکیده
This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. It generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. We also present ETL Committee, an ensemble method that uses ETL as the base learner. The main advantage of ETL is its easy applicability to Natural Language Processing (NLP) tasks. Its modeling phase is quick and simple. It only requires a training set and a naive initial classifier. Moreover, ETL inherits the TBL flexibility to work with diverse feature types. We describe the application of ETL to four language independent NLP tasks: part-of-speech tagging, phrase chunking, named entity recognition and semantic role labeling. Overall, we apply it to thirteen different corpora in six different languages: Dutch, English, German, Hindi, Portuguese and Spanish. Our extensive experimental results demonstrate that ETL is an effective way to learn accurate transformation rules. Using a common parameter setting, ETL shows better results than TBL with handcrafted templates for the four tasks. For the Portuguese language, ETL obtains state-of-the-art results for all tested corpora. Our experimental results also show that ETL Committee improves the effectiveness of ETL classifiers. Using the ETL Committee approach, we obtain state-of-the-art competitive performance results in the thirteen corpus-driven tasks. We believe that by avoiding the use of handcrafted templates, ETL enables the use of transformation rules to a greater range of NLP tasks.
منابع مشابه
Phrase Chunking Using Entropy Guided Transformation Learning
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...
متن کاملRule and Tree Ensembles for Unrestricted Coreference Resolution
In this paper, we describe a machine learning system based on rule and tree ensembles for unrestricted coreference resolution. We use Entropy Guided Transformation Learning (ETL) and Decision Trees as the base learners, and, respectively, ETL Committee and Random Forest as ensemble algorithms. Our system is evaluated on the closed track of the CoNLL 2011 shared task: Modeling Unrestricted Coref...
متن کاملQuotation Extraction for Portuguese
Quotation extraction consists of identifying quotations and their authors. In this work, we present a Quotation Extraction system for Portuguese that is based on Entropy Guided Transformation Learning, a supervised Machine Learning algorithm. This is the first system that uses a Machine Learning approach for Portuguese. In order to train and evaluate the proposed system, we build the GLOBOQUOTE...
متن کاملPortuguese Language Processing Service
Current Natural Language Processing tools provide shallow semantics for textual data. These kind of knowledge could be used in the Semantic Web. In this paper, we describe F-EXT-WS, a Portuguese Language Processing Service that is now available at the Web. The first version of this service provides Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. All these tools were b...
متن کاملRobot Speech Learning via Entropy Guided LVQ and Memory Association
The goal of this project is to teach a computer-robot system to understand human speech through natural humancomputer interaction. To achieve this goal, we develop an interactive and incremental learning algorithm based on entropy-guided LVQ and memory association. Supported by this algorithm, the robot has the potential to learn unlimited sounds progressively. Experimental results of a multili...
متن کامل